Learning an English-chinese Lexicon from a Parallel Corpus
نویسندگان
چکیده
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is nontrivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manuallyfiltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant corpus size with a non-toy vocabulary.
منابع مشابه
Learning an English - Chinese Lexiconfrom a Parallel
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-ltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a signiicance ltering method that...
متن کاملLarge - Scale Automatic Extraction of anEnglish - Chinese Translation
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the rst empirical results of the kind between an Indo-Europeanand non-Indo-Europeanlanguage for any signiicantvocabulary and corpus size. The learned vocabulary size is abo...
متن کاملClass Based Sense Definition Model for Word Sense Tagging and Disambiguation
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense amb...
متن کاملAutomatic Acquisition of a High-Precision Translation Lexicon from Parallel Chinese-English Corpora
This paper presents a hybrid approach to deriving a translation lexicon from unaligned parallel Chinese-English corpora. Two types of information, namely, proximity and document-external distributions of word pairs, are proposed to enhance the precision of the translation lexicon derived from statistical and dictionary-based methods. The former can identify translations of Chinese compounds, wh...
متن کاملCo-Training Based Bilingual Sentiment Lexicon Learning
In this paper, we address the issue of bilingual sentiment lexicon learning(BSLL) which aims to automatically and simultaneously generate sentiment words for two languages. The underlying motivation is that sentiment information from two languages can perform iterative mutual-teaching in the learning procedure. We propose to develop two classifiers to determine the sentiment polarities of words...
متن کامل